Leveraging Machine Readable Dictionaries in Discriminative Sequence Models

نویسندگان

  • Ben Wellner
  • Marc B. Vilain
چکیده

Many natural language processing tasks make use of a lexicon – typically the words collected from some annotated training data along with their associated properties. We demonstrate here the utility of corpora-independent lexicons derived from machine readable dictionaries. Lexical information is encoded in the form of features in a Conditional Random Field tagger providing improved performance in cases where: i) limited training data is made available ii) the data is case-less and iii) the test data genre or domain is different than that of the training data. We show substantial error reductions, especially on unknown words, for the tasks of part-of-speech tagging and shallow parsing, achieving up to 20% error reduction on Penn TreeBank part-of-speech tagging and up to a 15.7% error reduction for shallow parsing using the CoNLL 2000 data. Our results here point towards a simple, but effective methodology for increasing the adaptability of text processing systems by training models with annotated data in one genre augmented with general lexical information or lexical information pertinent to the target genre (or domain).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine-Readable Dictionaries

The papers in this panel consider machine-readable dictionaries from several perspectives: research in computational linguistics and computational lexicology, the development of tools for improving accessibility, the design of lexical reference systems for educational purposes, and applications of machine-readable dictionaries in information science contexts. As background and by way of introdu...

متن کامل

Word Sense Disambiguation with Very Large Neural Networks Extracted from Machine Readable Dictionaries

In this paper, we describe a means for automatically building very large neural networks (VLNNs) from definition texts in machine-readable dictionaries, and demonstrate the use of these networks for word sense disambiguation. Our method brings together two earlier, independent approaches to word sense disambiguation: the use of machine-readable dictionaries and connectionnist models. The automa...

متن کامل

The Dictionary Server

The term "machine-readable dictionary" can clearly be taken in two ways. In its stronger and better established interpretation, it presumably refers to dictionaries intended for machine consumption and use as in a language processing system of some sort. In a somewhat weaker sense, it has to do with dictionaries intended for human consumption, but through the intermediary of a machine. Ideally,...

متن کامل

Extracting Knowledge Bases from Machine- Readable Dictionaries : Have We Wasted Our Time?

Machine-readable versions of everyday dictionaries have been seen as a likely source of information for use in natural language processing because they contain an enormous amount of lexical and semantic knowledge. However, after 15 years of research, the results appear to be disappointing. No comprehensive evaluation of machine-readable dictionaries (MRDs) as a knowledge source has been made to...

متن کامل

An Approach to Building the Hierarchical Element of a Lexical Knowledge Base from a Machine Readable Dictionary. an Approach to Building the Hierarchical Element of a Lexical Knowledge Base from a Machine Readable Dictionary 1

This abstract describes an approach to extracting taxonomies from machine readable dictionaries and using them to structure a lexical knowledge base which incorporates default inheritance. Taxonomy construction is based on an intuitive notion of the organisation of the substantial quantities of data in machine readable dictionaries which were developed for quite independent purposes. Our intent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006